AITopics | optimal bias function

Collaborating Authors

optimal bias function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsDec-25-2025, 19:13:59 GMT

name change, regret minimization, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zihan Zhang, Xiangyang Ji

Neural Information Processing SystemsSep-27-2025, 22:17:50 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, optimal bias function, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country: North America (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Jian QIAN, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

Neural Information Processing SystemsAug-20-2025, 05:45:10 GMT

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).

data mining, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zihan Zhang, Xiangyang Ji

Neural Information Processing SystemsAug-19-2025, 21:52:09 GMT

Therefore, there is a trade-off between exploration and exploitation, i.e., taking actions we have not learned accurately enough and taking actions which

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America (0.15)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Reviews: Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsJan-26-2025, 02:11:46 GMT

The paper focuses on the important problem of designing optimal algorithms for exploration-exploitation (whose upper-bound matches the lower bound). The paper is not well organized and written. It is difficult to abstract from the mathematical formulation and grasps the key ideas behind the improvement of the regret bound. As far as I understood, the first important component in improving the bound is to use variance dependent confidence intervals (ie Bernstein). Together with the knowledge of H, this allows designing a tighter optimism (Eq.

optimal bias function, regret minimization, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Reviews: Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsJan-26-2025, 01:56:38 GMT

This paper has lead to a long and thoughtful discussion between the reviewers. The main points that were raised are the following: The results are novel and close a long-standing gap between upper and lower bounds in a very important problem. While the reviewers have agreed that the results are significant and they definitely bring the field forward, an expert reviewer argued that the step forward is perhaps not significantly big enough to warrant publication in the present form. However, after much discussion, the other reviewers made a strong case for acceptance and all reviewers agreed that the community would clearly benefit from this paper being published. That said, I strongly encourage the authors to work hard on improving the presentation for the final version.

optimal bias function, reinforcement learning, reviewer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsOct-10-2024, 14:59:51 GMT

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function h {*}, the proposed algorithm achieves a regret bound of \tilde{O}(\sqrt{SATH}) \footnote{The symbol \tilde{O} means O with log factors ignored. Furthermore, this regret bound matches the lower bound of \Omega(\sqrt{SATH}) \cite{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of \tilde{O}(\sqrt{DSAT}) for MDPs with finite diameter D compared to the lower bound of \Omega(\sqrt{DSAT}) \cite{jaksch2010near}.

optimal bias function, regret minimization, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zhang, Zihan, Ji, Xiangyang

Neural Information Processing SystemsMar-18-2020, 21:32:22 GMT

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h {*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SATH})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored. This result outperforms the best previous regret bounds $\tilde{O}(HS\sqrt{AT})$\cite{bartlett2009regal} by a factor of $\sqrt{SH}$. Furthermore, this regret bound matches the lower bound of $\Omega(\sqrt{SATH})$\cite{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{DSAT})$ for MDPs with finite diameter $D$ compared to the lower bound of $\Omega(\sqrt{DSAT})$\cite{jaksch2010near}.

optimal bias function, regret minimization, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zhang, Zihan, Ji, Xiangyang

arXiv.org Machine LearningJun-14-2019

We present an algorithm based on the Optimism in the Face of Uncertainty (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SAHT})$for MDP with $S$ states and $A$ actions, in the case that an upper bound $H$ on the span of $h^{*}$, i.e., $sp(h^{*})$ is known. This result outperforms the best previous regret bounds $\tilde{O}(HS\sqrt{AT})$ [Bartlett and Tewari, 2009] by a factor of $\sqrt{SH}$. Furthermore, this regret bound matches the lower bound of $\Omega(\sqrt{SAHT})$ [Jaksch et al., 2010] up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{SADT})$ for MDPs with finite diameter $D$ compared to the lower bound of $\Omega(\sqrt{SADT})$ [Jaksch et al., 2010].

machine learning, probability 1, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1906.0511

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback